Machine-readable Document
   HOME

TheInfoList



OR:

A machine-readable document is a
document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" o ...
whose content can be readily processed by computers. Such documents are distinguished from machine-readable data by virtue of having sufficient structure to provide the necessary context to support the business processes for which they are created.


Definition

Data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
without context (language use) is meaningless and lacks the four essential characteristics of trustworthy
business record A business record is a document (hard copy or digital) that records an "act, condition, or event" related to business. Business records include meeting minutes, memoranda, employment contracts, and accounting source documents. It must be retrievab ...
s specified in ISO 15489 Information and documentation -- Records management: * Reliability * Authenticity * Integrity * Usability The vast bulk of information is
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
and, from a business perspective, that means it is "immature", i.e., Level 1 (chaotic) of the Capability Maturity Model. Such immaturity fosters inefficiency, diminishes quality, and limits effectiveness. Unstructured information is also ill-suited for records management functions, provides inadequate evidence for legal purposes, drives up the cost of
discovery Discovery may refer to: * Discovery (observation), observing or finding something unknown * Discovery (fiction), a character's learning something unknown * Discovery (law), a process in courts of law relating to evidence Discovery, The Discover ...
in litigation, and makes access and usage needlessly cumbersome in routine, ongoing
business process A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
es. There are at least four aspects to machine-readability: * First, words or phrases should be discretely delineated (tagged) so that computer software and/or hardware logic can be applied to them as individual conceptual elements. * Second, the semantics of each element should be specified so that computers can help human beings achieve a common understanding of their meanings and potential usages. * Third, if the relationships among the individual elements are also specified, computers can automatically apply inferences to them, thereby further relieving human beings of the burden of trying to understand them, particularly for purposes of inquiry, discovery, and analysis. * Fourth, if the structures of the documents in which the elements occur are also specified, human understanding is further enhanced and the data becomes more reliable for legal and business-quality purposes. As early as 1983, the U.S. Government Accountability Office (GAO) began emphasizing the benefits of machine-readable information. Still sooner, in 1981, GAO began reporting on the problem of inadequate record-keeping practices in the
U.S. federal government The federal government of the United States (U.S. federal government or U.S. government) is the national government of the United States, a federal republic located primarily in North America, composed of 50 states, a city within a fed ...
. Such deficiencies are not unique to government and advances in information technology mean that most information is now "born digital" and thus potentially far more easily managed by automated means. However, in testimony to Congress in 2010, GAO highlighted problems with managing electronic records, and as recently as 2015, GAO has continued to report inadequacies in the performance of Executive Branch agencies in meeting records management requirements. Moreover, more than two decades after a major and formerly highly respected auditing firm,
Arthur Andersen Arthur Andersen was an American accounting firm based in Chicago that provided auditing, tax advising, consulting and other professional services to large corporations. By 2001, it had become one of the world's largest multinational corporat ...
, met its demise due to a records destruction scandal, record-keeping practices became a central issue in the 2016 Presidential election. On January 4, 2011, President Obama signed H.R. 2142, the
Government Performance and Results Act The Government Performance and Results Act of 1993 (GPRA) () is a United States law enacted in 1993,Congress, U. S., and An Act. "Government Performance and Results Act of 1993." In ''103rd Congress. Congressional Record''. 1993. one of a series o ...
(GPRA) Modernization Act of 2010 (GPRAMA), into law as P.L. 111-352. Section 10 of GPRAMA requires U.S. federal agencies to publish their strategic and performance plans and reports in searchable, machine-readable format. Additionally, in 2013, he issued
Executive Order In the United States, an executive order is a directive by the president of the United States that manages operations of the federal government. The legal or constitutional basis for executive orders has multiple sources. Article Two of t ...
13642, Making Open and Machine Readable the New Default for Government Information in general. On July 28, 2016, the
Office of Management and Budget The Office of Management and Budget (OMB) is the largest office within the Executive Office of the President of the United States (EOP). OMB's most prominent function is to produce the president's budget, but it also examines agency programs, pol ...
(OMB) followed up by including in the revised issuance of Circular A-130 direction for agencies to use open, machine-readable formats, and to publish "public information online in a manner that promotes analysis and reuse for the widest possible range of purposes", meaning that the information is both publicly accessible and machine-readable. On January 14, 2019, President Trump signed into law H.R. 4174, the OPEN Government Data Act (OGDA), which codifies in law the requirement for agencies to make their public data assets available in machine-readable format. On June 28, 2019, in Circular A-11, OMB expressed intent to begin complying with section 10 of GPRAMA. In support of such policy direction, technological advancement is enabling more efficient and effective management and use of machine-readable electronic records.
Document-oriented database A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one ...
s have been developed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Extensible Markup Language (
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
) is a World Wide Web Consortium ( W3C) Recommendation setting forth rules for encoding documents in a format that is both
human-readable A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In most c ...
and machine-readable. Many
XML editor An XML editor is a markup language editor with added functionality to facilitate the editing of XML. This can be done using a plain text editor, with all the code visible, but XML editors have added facilities like tag completion and menus and but ...
tools have been developed and most, if not all major information technology applications support XML to greater or lesser degrees. The fact that XML itself is an open, standard, machine-readable format makes it relatively easy for application developers to do so. The W3C's accompanying XML Schema (
XSD XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium ( W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item con ...
) Recommendation specifies how to formally describe the elements in an XML document. With respect to the specification of XML schemas, the Organization for the Advancement of Structured Information Standards (OASIS) is a leading standards-developing organization. However, many technical developers prefer to work with JSON, and to define the structure of JSON data for validation, documentation, and interaction control, JSON Schema was developed by the
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
(IETF). The
Portable Document Format Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating syste ...
(PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of the presentation of the document, including the text, fonts, graphics, and other information needed to display it. PDF/A is an ISO-standardized version of the PDF specialized for use in the archiving and long-term preservation of electronic documents. PDF/A-3 allows embedding of other file formats, including
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, into PDF/A conforming documents, thus potentially providing the best of both human- and machine-readability. The W3C's XSL-FO (XSL Formatting Objects) markup language is commonly used to generate PDF files Metadata, data about data, can be used to organize electronic resources, provide digital identification, and support the archiving and preservation of resources. In well-structured, machine-readable electronic records, the content can be repurposed as both data and metadata. In the context of electronic record-keeping systems, the terms "management" and "metadata" are virtually synonymous. Given proper metadata, records management functions can be automated, thereby reducing the risk of spoliation of evidence and other fraudulent manipulations of records. Moreover, such records can be used to automate the process of auditing data maintained in
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
s, thereby reducing the risk of single points of failure associated with the Machiavellian concept of a
single source of truth In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas su ...
.
Blockchain (database) A blockchain is a type of distributed ledger technology (DLT) that consists of growing lists of records, called ''blocks'', that are securely linked together using cryptography. Each block contains a cryptographic hash of the previous block, ...
is a new technology for maintaining continuously-growing lists of records secured from tampering and revision. A key feature is that every node in a decentralized system has a copy of the blockchain so there is no
single point of failure A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software appl ...
subject to manipulation and fraud.


See also

* Budapest Declaration on Machine Readable Travel Documents *
Comparison of XML editors This is a list of XML editors. Note that any text editor can edit XML, so this page only lists software programs that specialize in this task. It doesn't include text editors that merely do simple syntax coloring or expanding and collapsing of nod ...
* Four corners (law) *
Integrity Integrity is the practice of being honest and showing a consistent and uncompromising adherence to strong moral and ethical principles and values. In ethics, integrity is regarded as the honesty and truthfulness or accuracy of one's actions. In ...
and particularly
Data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
*
Linked data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but ...
* Machine-readable passport * Markup language *
Open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movement ...
*
Reliability (statistics) In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:"It is the characteristic of a set of test scores that ...
,
Data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
,
Reliability (computer networking) In computer networking, a reliable protocol is a communication protocol that notifies the sender whether or not the delivery of data to intended recipients was successful. Reliability is a synonym for assurance, which is the term used by the IT ...
, and
Reliability (research methods) In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:"It is the characteristic of a set of test scores that ...
*
Strategy Markup Language {{primary sources, date=April 2017 Strategy Markup Language (StratML) is an XML-based standard vocabulary and schema for the information commonly contained in strategic and performance plans and reports. StratML Part 1 specifies the elements of st ...
(StratML) *
Structured document A structured document is an electronic document where some method of markup is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain po ...
*
Tag (metadata) In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found agai ...
*
Universal Business Language Universal Business Language (UBL) is an open library of standard electronic XML business documents for procurement and transportation such as purchase orders, invoices, transport logistics and waybills. UBL was developed by an OASIS Technical C ...
(UBL) *
XBRL XBRL (eXtensible Business Reporting Language) is a freely available and global framework for exchanging business information. XBRL allows the expression of semantic meaning commonly required in business reporting. The language is XML-based an ...
(eXtensible Business Reporting Language)


References

{{reflist


External links


OMB M-13-13
Open Data Policy: Managing Information as an Asset, which requires agencies to use open, machine-readable, data format standards

January 2005, which outlines the characteristics of trustworthy records.
Driving a Stake in the Heart of the Capone Consultancy Method of Records Management: Best Practices for Correcting Non-Records Non-Policy Nonsense
March 9, 2015 * The U.S. Code, which includes the term "machine-readable
over 50 times
as of September 10, 2016 __notoc__ Data management Records management